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Abstract 

It  was  established  in  [4,  5]  that  importance  sampling  algorithms  for 
estimating  rare-event  probabilities  are  intimately  connected  with  two- 
person  zero-sum  differential  games  and  the  associated  Isaacs  equation. 
The  purpose  of  the  present  paper  and  a  companion  paper  [6]  is  to 
show  that  the  classical  sense  subsolutions  of  the  Isaacs  equation  can 
be  used  as  a  basic  and  flexible  tool  for  the  construction  and  analysis 
of  efficient  importance  sampling  schemes.  The  importance  sampling 
algorithms  based  on  subsolutions  are  dynamic  in  the  sense  that  during 
the  course  of  a  single  simulation,  the  change  of  measure  used  at  each 
time  step  may  depend  on  the  outcome  of  the  simulation  up  until  that 
time.  While  [6]  focuses  on  a  theoretical  aspects,  the  present  paper  dis¬ 
cusses  explicit  methods  of  constructing  subsolutions,  implementation 
issues,  and  simulation  results. 
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1  Introduction  and  background 


It  was  established  in  [4,  5]  that  importance  sampling  algorithms  for  estimat¬ 
ing  rare-event  probabilities  or  functionals  that  are  largely  determined  by 
rare-events  are  closely  related  to  deterministic  differential  games.  More  pre¬ 
cisely,  the  asymptotic  optimal  performance  of  importance  sampling  schemes 
can  be  characterized  by  the  value  function  of  a  two-person  zero-sum  differ¬ 
ential  game,  which  can  in  turn  be  characterized  by  the  solution  to  the  Isaacs 
eqnation  (a  nonlinear  PDE)  associated  with  the  game.  It  was  also  discussed 
in  [4,  5]  that  one  can  construct  asymptotically  optimal  importance  sampling 
algorithms  based  on  this  solution. 

The  purpose  of  the  present  paper  and  a  companion  paper  [6]  is  to  ex¬ 
plore  this  connection  in  fnrther  depth.  A  main  new  featnre  is  that  it  is 
possible  to  constrnct  efficient  importance  sampling  schemes  based  on  subso¬ 
lutions  of  the  Isaacs  equation.  This  leads  to  a  more  general  class  of  schemes 
and  lends  great  flexibility  to  the  designer.  We  will  see  that  one  can  often 
construct  subsolutions  that  are  strncturally  simpler  than  the  actual  solu¬ 
tion,  but  which  correspond  to  asymptotically  optimal  (or  at  least  nearly 
asymptotically  optimal)  importance  sampling  schemes  that  reflect  this  sim¬ 
pler  strnctnre.  This  simplicity  is  important,  since  one  is  often  interested  in 
properties  other  than  jnst  asymptotic  optimality,  e.g.,  ease  of  construction 
and  ease  of  implementation.  The  companion  paper  [6]  focnses  on  theoretical 
aspects  of  this  approach,  and  proves  a  basic  resnlt  on  the  asymptotic  perfor¬ 
mance  of  importance  sampling  schemes  that  are  based  on  subsolutions.  In 
contrast,  the  present  paper  will  focus  on  methods  for  constructing  snbsolu- 
tions  and  their  application  to  various  process  models,  events,  and  expected 
values. 

The  theory  in  [6]  assnmes  a  setting  that  includes  as  special  cases  both 
the  sum  of  independently  identically  distributed  (iid)  random  variables  and 
the  empirical  measure  of  a  hnite-state  Markov  chain.  However,  its  potential 
application  is  much  broader,  and  includes,  for  example,  systems  with  state 
dependencies,  systems  with  constrained  dynamics,  and  expected  values  in¬ 
volving  path-dependent  events.  Indeed,  while  many  examples  discnssed  in 
the  present  paper  do  not  ht  directly  into  the  theoretical  framework  of  [6], 
we  will  see  that  efficient  importance  sampling  algorithms  can  be  designed 
via  snbsolutions. 

The  paper  is  organized  as  follows.  In  Section  2  we  focus  on  a  particu¬ 
lar  problem  of  interest:  importance  sampling  for  sums  of  a  functional  of  a 
Markov  chain.  This  is  a  special  case  of  the  model  studied  in  [6].  We  hrst  re¬ 
call  the  Isaacs  equation  that  is  appropriate  for  this  problem,  and  then  review 
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the  notion  of  a  suh solution /control  pair  for  this  eqnation.  Such  pairs  define 
importance  sampling  schemes,  and  we  also  review  the  theoretical  bounds  on 
performance  one  obtains  from  a  subsolution.  Section  3  discusses  methods  for 
the  construction  of  subsolutions  for  this  particular  problem  in  great  detail, 
starting  with  the  simplest  possible  examples  and  then  extending  to  more 
complicated  situations.  Section  4  is  devoted  to  examples.  Whenever  possi¬ 
ble,  we  have  chosen  examples  for  which  there  is  an  alternative  method  to 
compute  (at  least  approximately)  the  true  value.  The  only  exception  is  the 
example  of  Section  4.5.  The  first  part  of  Section  4  the  applies  the  techniques 
of  Section  3  to  problems  involving  sums  of  random  variables,  and  presents 
simulation  results  for  the  corresponding  schemes.  However,  the  use  of  sub¬ 
solution  methods  extends  far  beyond  the  simple  setting  of  sums  of  random 
variables.  For  example,  the  paper  [3]  shows  how  to  apply  subsolutions  to 
construct  and  analyze  importance  sampling  schemes  for  stochastic  networks. 
The  second  part  of  Section  4  considers  several  other  types  of  problems  to 
which  the  approach  can  be  applied,  including  multi-dimensional  level  cross¬ 
ing  problems,  probabilities  and  expected  values  that  depend  on  a  sample 
path  of  a  process,  and  an  interesting  mixed  open/closed  queueing  network. 
Although  the  theoretical  analysis  of  these  problems  is  not  covered  by  [6],  in 
each  case  the  application  of  the  Isaacs  equation  and  subsolution  approach 
follows  the  pattern  laid  out  for  the  simple  case  of  sums  of  random  variables. 
The  examples  are  representative,  but  far  from  exhaustive.  To  streamline 
the  presentation  the  details  of  certain  constructions  are  postponed  to  an 
appendix. 


2  Review  of  the  main  theoretical  results 

This  section  reviews  the  theoretical  results  in  [6].  To  ease  exposition,  we 
specialize  to  Markov  chains  and  leave  out  various  technical  assumptions  on 
the  underlying  processes  and  relevant  functionals.  See  [6]  for  these  details. 


2.1  Basic  setup 

Let  Y  =  {Yi,i  G  No}  denote  a  Markov  chain  with  state  space  S.  Assume 
that  S'  is  a  Polish  space,  and  let  p{y,  dz)  denote  the  probability  transition 
kernel.  Our  interest  is  in  sums  of  the  form 


^  n—1 

Xn  =  -Tg{Yi), 
n 


i=0 
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where  g  :  S  —>■  is  a,  given  measurable  function. 

An  important  component  of  the  analysis  is  the  eigenvalue/eigenvector 
problem  associated  with  the  non-negative  kernel  exp(a,  g{z))p{y,  dz)  for  an 
arbitrary  a  G 


e("’®(^)>r(2:;  a)p{y,  dz)  =  a). 


(2.1) 


It  follows  from  (2.1)  that,  for  each  a  G 


P{y,dz-oi)  = 

r{y;a) 


(2.2) 


dehnes  a  probability  transition  kernel.  The  large  deviation  rate  function  L 
associated  with  the  process  {Xn}  can  be  expressed  in  terms  of  the  eigenvalue 
H.  Indeed,  L  is  the  convex  conjugate  of  H,  that  is,  for  each  (3  G 


L(/3)  =  sup  [(a, (3)  —  H{a)\. 

aSK'* 


2.2  The  Isaacs  equation  and  subsolution 

Suppose  one  wishes  to  estimate  the  expected  value  of  certain  functionals  of 
Xn  for  large  n,  using  importance  sampling  algorithms  based  on  changes  of 
measure  of  the  form  (2.2).  It  follows  from  [5]  that  the  optimal  performance 
of  such  schemes  can  be  characterized  by  the  Isaacs  equation 

Wt  +  sup  inf  m{DW;  a,P)  =  Q  (2.3) 


with  a  suitable  terminal  condition.  Here  W  :  (x,  t)  G  x  [0, 1]  — >  M,  Wt 
denotes  the  partial  derivative  with  respect  to  t,  DW  the  gradient  in  x,  and 

BI(s;  a,  (3)  =  (s,  (3)  +  L{(3)  +  (a,  (3)  -  H (a)  (2.4) 


for  s,a,f3& 

In  order  to  construct  good  importance  sampling  schemes,  one  does  not 
need  the  solution  to  the  Isaacs  equation.  It  turns  out  that  Ending  a  good 
subsolution  (more  precisely,  a  subsolution/control  pair)  is  often  sufficient. 
We  need  the  following  definition.  Given  AT  G  N,  consider  a  function  W  : 

X  [0, 1]  — >  M  and,  for  1  <  A:  <  AT,  functions  pfc  :  x  [0, 1]  ^  M, 

Afc  :  X  [0, 1]  M'^. 
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Definition  2.1  The  collection  iW ,  pk-,ak)  will  called  a  generalized  sub¬ 
solution/control  if  the  following  conditions  hold,  x  [0, 1]  — >  M,  A:  = 

1, ...  ,K  is  a  partition  of  unity,  i.e.,  each  pk  is  non-negative,  and 

K 

'^Pk{x,t)  =  1 
k=l 

for  all  {x,  t)  G  X  [0, 1] .  Wt  and  DW  have  representations 
K  K 

Wt{x,  t)  =  '^  Pk{x,  t)rk{x,  t),  DW {x,  t)  =  '^  Pk{x,  t)sk{x,  t), 

k=l  k=l 


and  for  each  k  =  1, . . . ,  K 

rkix,t)+  inf  E.{sk{x,t)-,akix,t),P)  >0. 
pmd- 

The  functions  {rk,Sk,  PkWk)  o.re  uniformly  bounded  and  Lipschitz  continu¬ 
ous. 

For  any  generalized  subsolution/control  iW ,  pk,  oik)-,  it  is  not  difficult  to 
show  that 

Wt  +  sup  inf  n{DW-,  a,  (3)  >  0. 

QgRd  P&M.d 

In  other  words,  IF  is  a  classical  subsolution  to  the  Isaacs  equation  (2.3). 
It  will  turn  out  that  the  Q;fc)-component  determines  the  change  of  mea¬ 
sure  used  in  importance  sampling,  and  so  we  use  terminology  “subsolu¬ 
tion/control.” 

Remark  2.1  For  the  special  case  where  K  =  1,  and  with  notation  a  =  ai, 
we  simply  write  (IF,  a)  and  call  it  a  subsolution/control  pair. 

Remark  2.2  Suppose  that  IF  is  a  classical  subsolution  to  the  Isaacs  equa¬ 
tion  (2.3),  that  is 


IFt  +  sup  inf  BI(DIF;  a, /?)  >0. 

QgRd  /3eiR‘^ 

Let  a*{x,t)  be  the  supremizer  in  the  above  display  [indeed,  one  can  easily 
identify  a*{x,t)  =  —DW{x,t)/2].  Then  (IF, a*)  is  a  subsolution/control 
pair,  provided  that  a*  is  uniformly  bounded  and  Lipschitz  continuous. 
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2.3  Importance  sampling  algorithms  based  on  subsolutions 

In  this  section  we  describe  the  importance  sampling  algorithms  associated 
with  a  given  generalized  subsolution/control  (14^,  d^).  For  fixed  n,  let 


ak,j{x)  =  ak{x,j/n),  pk,j{x)  =  pk{x,j/n). 


Processes  {Xj}  and  {Yj}  are  constructed  recursively  as  follows.  Let  Xq  =  0 
and  Yq  =  Yq.  Given  that  Xj  =  x  and  Yj  =  y,  simulate  under  the 
distribution 

K 

'^Pk,j{x)P{y,dz]ak,j{x)) .  (2.5) 

k=l 

In  other  words,  one  first  simulates  a  multinomial  random  variable  I  taking 
values  in  {l,2,...,iL}  such  that  P{I  =  k}  =  pkj{x),  and  then  simulates 
Yj-^-i  from  the  distribution  P{y,  dz]  aij{x)).  Finally,  update  the  state  dy¬ 
namics  by  letting 

Xj+i  =  Xj  +  -g{Yj+i). 

An  unbiased  importance  sampling  estimator  can  then  be  obtained  by 
multiplying  the  functional  of  interest  by  a  Radon-Nikodym  derivative  (i.e., 
likelihood  ratio).  For  example,  suppose  we  are  interested  in  estimating 
E[G{Xn)]  for  some  function  G.  A  single  sample  of  the  unbiased  importance 
sampling  estimator  is 


Z 


n 


n—1 


GiXn)  n 


j=0  Lk=l 


K 


Pk,j  {Xj 


r{Yj+i]ak,j{Xj)y 
r{yj-,ak,j{Xj))  _ 


(2.6) 


The  importance  sampling  algorithm  is  then  just  the  sample  average  of  a 
number  of  independent  replications  of  Z”. 


Remark  2.3  In  most  of  the  applications  considered  in  this  paper,  one  can 
construct  a  generalized  subsolution/control  {W,  Pk,(ik)  where  the  ak  are  all 
constants  and  with  K  of  moderate  size.  This  has  a  distinct  advantage  in 
numerical  implementation.  For  example,  in  the  Markov  chain  case,  to  com¬ 
pute  a  change  of  measure  one  often  needs  to  numerically  solve  the  eigen¬ 
value/eigenvector  problem.  If  ak  is  not  a  constant,  one  needs  to  solve  eigen¬ 
value/eigenvector  problems  over  and  over  again,  depending  on  the  current 
state  of  the  simulation.  This  could  become  very  computationally  demand¬ 
ing. 
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2.4  The  main  theoretical  result 


To  simplify  the  exposition,  we  specialize  for  now  to  the  case  where  the 
quantity  of  interest  is 

E[G{Xn)]  =  =  P{Xn  G  Gl}. 

for  some  Borel  set  yl  C  and  assume  the  large  deviation  limit 

lim  —  logP{X„  G  4.}  =  —  inf  L{/3). 

n^oo  n  p&A 

For  convenience  let  7  =  inf,gg^  -h(/3). 

The  following  result  is  a  special  case  of  Theorem  7.1  of  [6]. 

Theorem  2.2  Let  (IT,  pk,  otk)  he  a  generalized  subsolution/ control  such  that 

W{x,l)<0  (2.7) 


for  all  X  G  A.  Let  be  the  second  moment  of  the  associated  importance 
sampling  estimator,  that  is, 

V^  =  E  , 


where  is  defined  in  (2.6).  Let 


IT”  =  --logT”. 
n 

Then 

lim  inf  IT”  >  1T(0,0). 


The  theorem  provides  a  lower  bound  for  the  exponential  decay  rate  of  the 
second  moment  of  the  importance  sampling  estimator  Z”.  An  upper  bound 
that  is  independent  of  the  importance  sampling  scheme  also  holds.  Indeed, 
consider  any  importance  sampling  estimator  one  might  use  to  approximate 
P{Xn  G  A},  and  analogous  to  IT”,  let  IT”  denote  the  normalized  logarithm 
of  its  second  moment.  Then  an  elementary  calculation  based  on  Jensen’s 
inequality  shows  that 

lim  sup  IT”  <  27. 

n—*oo 

Thus  a  natural  goal  in  importance  sampling  design  is  to  come  as  close  to 
this  upper  bound  as  possible.  In  other  words,  we  would  like  to  find  a  sub¬ 
solution/control  for  which  IT(0,  0)  is  close  to  2'y.  Any  importance  sampling 
estimator  that  achieves  this  maximal  exponential  decay  rate  is  called  as¬ 
ymptotically  optimal. 
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Remark  2.4  Theorem  2.2  holds  under  much  greater  generality  [6].  For 
example,  suppose  one  is  interested  in  the  expectation 

E[G{Xn)]  =  F;[exp{-uT(X„)}] 

for  some  function  F.  A  result  analogous  to  Theorem  2.2  holds  except  that 
the  terminal  condition  as  in  equation  (2.7)  is  replaced  by 

W{x,  1)  <  2F{x) 


for  all  X  G 

2.5  Discussion 

The  use  of  subsolutions  is  applicable  in  much  broader  settings  than  those 
discussed  in  this  section,  or  even  those  in  [6].  For  instance,  an  interesting 
class  of  problems  is  to  estimate  the  probabilities  of  path  dependent  events 
(say,  events  that  depend  on  the  extrema  of  the  sample  path).  In  this  case, 
the  subsolution  needs  to  satisfy  certain  boundary  conditions  besides  the  ter¬ 
minal  condition.  Another  interesting  class  of  problems  where  the  use  of 
subsolutions  appears  crucial  involve  probabilities  associated  with  systems 
with  constrained  dynamics  (e.g.,  queuing  networks  [3]),  where  the  bound¬ 
ary  condition  and/or  terminal  condition  may  take  more  complicated  forms. 
Depending  on  the  class  of  changes  of  measure  used  and  the  dynamics  of  the 
system,  the  Isaacs  equation  may  take  a  different  form  from  the  one  used 
here.  However,  it  is  always  the  case  that  the  use  of  subsolutions  is  criti¬ 
cal  to  the  construction  of  importance  sampling  schemes.  Section  4  presents 
a  few  such  examples  even  though  they  are  not  covered  by  the  theoretical 
framework  of  [6]. 

3  Construction  of  generalized  subsolution/controls 

To  illustrate  how  one  can  construct  generalized  subsolution/controls,  we 
specialize  to  the  setup  of  Section  2  and  assume  that  the  quantity  of  interest 
is 

F[G{Xn)]  =  =  P{Xn  G  Gl}. 

The  extrapolation  to  more  general  setups  will  also  be  considered  later  in  the 
paper. 

There  are  several  concerns  in  the  construction.  To  begin,  the  terminal 
condition  W{x,l)  <  0  for  x  G  A  must  be  satisfied  in  order  for  Theorem 


2.2  to  be  valid.  Secondly,  one  wishes  1^(0, 0)  to  be  as  close  as  possible  to 
the  optimal  decay  rate  27.  Finally,  one  would  like  the  controls  {pk,^k)  to 
take  simple  forms,  since  this  gives  importance  sampling  algorithms  that  are 
simpler  and  easier  to  implement. 

Our  construction  is  as  follows.  We  first  identify  a  family  of  particularly 
simple  subsolution/control  pairs  to  the  Isaacs  equation  (2.3).  For  each  of 
these  pairs,  say  (W,a),  W  is  affine  in  {x,t)  and  a  takes  constant  value. 
W  will  actually  be  a  solution  to  the  Isaacs  equation  alone  (i.e.,  with  the 
terminal  condition  unspecified).  This  family,  denoted  from  now  on  by  A, 
will  contain  the  building  blocks  for  our  construction.  It  is  the  appropriate 
family  for  the  class  of  problems  under  consideration,  where  the  Hamiltonian 
H  does  not  depend  on  x. 

If  the  terminal  condition  (2.7)  is  satisfied  by  an  element  of  A  and  if 
IT(0,0)  is  sufficiently  close  to  27,  then  the  construction  is  complete.  This 
is  always  possible  when  A  is  convex  (see  Case  I  in  Section  3.2.1  below). 
When  this  is  not  possible,  it  is  often  possible  to  take  a  finite  collection  of 
such  pairs,  say  {{Wk-,  dfc),  k  =  1,2, ,  K},  so  that  the  minimum  of  {Wk} 
satisfies  the  terminal  condition  (2.7)  and  A^;^ITfc(0, 0)  equals  27. 

Since  the  each  Wk  is  a  classical  subsolution,  their  minimum  is  a  weak 
sense  subsolution,  but  not  a  classical  sense  subsolution.  However,  there  are 
several  simple  and  easily  implemented  methods  for  mollifying  A^j^ITfc  to 
produce  a  generalized  subsolution/control  iW ,  pk,Oik).  These  are  discussed 
in  Section  3.3. 

3.1  Affine  solutions  to  the  Isaacs  equation 

In  this  section  we  identify  A,  the  collection  of  affine  subsolution/control 
pairs  to  the  Isaacs  equation  (2.3). 

For  any  given  a  G  and  c  G  M,  consider  the  affine  function 

W {x,  t)  =  —2{a,  x)  +  c  —  2(1  —  t)H{oi). 

We  claim  that  (IT,  d)  is  a  subsolution/control  pair.  Indeed,  thanks  to  the 
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convex  conjugacy  between  H  and  L, 

Wt  +  inf  moW]  d,  13) 

0m.d 

=  Wt-h  inf  l(DW,^)  +  L(/3)  +  (a,/3}-H(a)] 

=  2ff(a)+  inf  [(—2a,/3}+L(/3)  +  (a,/3}—ff(a)] 

/3eK'^ 

=  11(a)  +  inf  [L(j3)  -  (a,  [3)] 

/3eR‘* 

=  0. 

Let  A  be  the  collection  of  all  such  functions,  that  is 

A=  |(iy,d)  :  W  =  -2(a,x) +c- 2(1 -t)II(a),  a  e  m}  . 

Remark  3.1  It  is  not  difficult  to  show  that  for  every  (W,  a)  G  A,  the  affine 
function  W  is  indeed  a  solution  to  the  Isaacs  equation  (2.3). 

3.2  Piecewise  afRne  viscosity  subsolutions 

As  mentioned  above,  the  technique  used  to  construct  a  generalized  subsolu¬ 
tion/control  requires  finding  a  finite  collection  of  affine  subsolution/control 
pairs  in  A  such  that  their  minimum  satisfies  the  appropriate  terminal  condi¬ 
tion  and  takes  a  large  value  at  (0,  0)  [preferably  27,  the  optimal  exponential 
decay  rate].  We  begin  in  this  section  with  the  simplest  examples. 

3.2.1  Example:  Estimating  P{Xn  G  A} 

Consider  the  setup  in  Section  2.1,  and  suppose  that  one  wishes  to  estimate 
P{Xn  G  A}  for  some  Borel  set  A  C  Throughout  this  section  we  assume 

inf  L(P)  =  ml_L(P)  G  (0,oo), 

0eA°  0eA 

where  A°,A  denote  the  interior  and  the  closure  of  A,  respectively.  It  follows 
that  the  exponential  decay  rate  is 

0&A  0£A°  0£A 

Let  f3*  G  A  denote  a  minimizer  of  L  over  A.  Let  a*  satisfy 

L(P*)  =  sup  [(«,/?*)  -  H(a)]  =  (a*,  13*)  -  H(a*). 
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a*  is  called  the  convex  conjugate  of  P* . 

Case  1.  Consider  the  simplest  case  where  yl  is  a  convex  set.  Thanks  to  the 
convexity  of  A,  the  vector  —a*  defines  an  outward  normal  of  A.  It  follows 
that 

A  C  {x  eW'- :  {x,a*)  >  {P*,a*)}.  (3.1) 

Consider  the  element  of  A  with  a  =  a*  and  c  =  2(/3*,q;*),  i.e., 

W{x,  t)  =  -2{a*,x)  +  2{P*,a*)  -  2(1  -  t)H{a*)- 
Thanks  to  the  inequality  (3.1),  for  each  x  G  A 

lT(x,  1)  =  -2{a\x)  +  2{P*,a*)  <  0. 

Therefore,  when  A  is  convex,  (IT,  a*)  provides  a  simple  subsolution/control 
pair  that  satisfies  the  terminal  condition.  The  value  at  (0, 0)  is 

1T(0,  0)  =  2{P*,a*)  -  2H{a*)  =  2L{P*)  =  2j, 

the  largest  possible  value.  Note  that  the  analysis  holds  if  we  replace  the 
convexity  assumption  on  A  by  the  assumption  that  (3.1)  holds. 

Case  2.  More  generally,  suppose  that  for  some  AT  G  N, 

A  C  Uk=i  {x  :  (x,  ak)  >  (/3fc,  Ofc)} 

where  Pk  and  ak  are  convex  conjugates,  and  that  for  each  k  L{Pk)  >  7.  A 
necessary  and  sufficient  condition  for  these  two  assumptions  to  hold  is  that 
A  should  be  contained  in  the  union  of  a  finite  number  of  half-spaces,  and 
that  the  infimum  of  L  on  each  of  these  half  spaces  is  at  least  7.  In  this  case 
Pk  can  be  taken  as  the  point  on  the  kth  half  space  that  minimizes  L,  and 
we  have  7  =  L{Pk)  for  some  k.  Several  of  the  numerical  examples  in  Section 
4  will  fall  into  this  category. 
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Figure  1.  Example  of  a  non-convex  A  with  a  two-piece  subsolution. 

Define  an  affine  subsolution/control  pair  {Wk-,Cik)  by 

Wk{x,t)  =  -2{ak,x)  +  2{ak,Pk)  -  2(1  -  t)H{ak) 
for  each  k  =  1, ...  ,K.  Consider  the  pointwise  minimum 

W{x,t)  =  f\k=l^k{x,t)- 

As  we  have  pointed  out,  W  defines  a  weak  sense  subsolution  to  the  Isaacs 
equation.  The  terminal  condition  (2.7)  is  satisfied.  Since  for  each  x  G  A 
{x,ak)  >  {13k,  (Xk)  for  some  k,  we  have 

W (x,  1)  <  Wk{x,  1)  =  -2  (x,  ak)  +  2{Pk,  ak)  <  0. 

Finally,  we  observe  that 

W{0,  0)  =  AtiWkiO,  0)  =  2  Ati  m,  ak)  -  2H{ak)]  =  2  Af=i  L{Pk)  =  2j. 

The  last  equality  holds  since  L{f3k)  >  7  for  each  k  and  with  equality  for 
some  k. 
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3.2.2  Example:  Estimating  Eexp{— nF(X„)} 

Consider  the  setup  in  Section  2.1,  and  suppose  that  one  wishes  to  estimate 
Eexpj— nF(X„)}  for  some  measurable  function  F  :  ^  M  U  {+oo}.  Let 

7  =  inf^gjgd  [-^(/3)  +  L{f5y\,  and  assume  that 

lim  —  logEexpl— =  — 7. 

n^oo  Ti 

This  large  deviation  limit  holds  under  various  sets  of  regularity  conditions 
on  F  [2].  In  this  case  the  terminal  condition  (2.7)  is  replaced  by 

W{x,  1)  <  2F{x) 


for  X  (see  Remark  2.4). 

The  development  here  parallels  the  probability  case  as  described  in  the 
previous  section.  For  simplicity,  we  assume  that  there  exists  (3*  that  min¬ 
imizes  L{P)  -t-  F{P)  over  (3  G  and  let  a*  be  its  convex  conjugate.  To 
avoid  technicalities,  we  also  assume  that  L  is  differentiable  at  P* ,  and  thus 

a*  =  DL{P*). 


Case  1.  We  first  consider  the  simplest  case  where  F  is  convex.  Consider 
an  affine  subsolution/control  pair  (IF,  a*)  where 

W{x,  t)  =  -2  {a*,x)  +  2[F{P*)  +  {a*,P*)]  -  2(1  -  t)H{a*). 

Since  /?*  is  a  minimizer  of  L{x)  +  F(x),  we  have 

OediL  +  F){P*), 

where  d  denotes  the  set  of  subdifferentials.  Therefore 

-a*  =  -DL{P*)  G  dF{P*). 

It  follows  that  the  affine  function  W{x,  1)  is  a  supporting  hyperplane  to  2F 
at  /?*,  and  hence 

W{x,  1)  <  2F{x) 

for  every  x,  i.e.,  the  terminal  condition  holds.  Also  observe  that 

1F(0, 0)  =  2F{P*)  +  2  {a*,P*)  -  2H{a*)  =  2F{P*)  +  2L{P*)  =  27. 
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Case  2.  Next  suppose  that  F  is  no  longer  convex.  If  there  exists  a  convex 
function  G  such  that  G  <  F,  G{[3*)  =  F{j3*),  and 


inf^  [L{13)  +  G{(3)]  =  inf^  [L(/3)  +  F(/3)]  =  7, 


then  we  reduce  to  the  previous  case.  More  generally,  suppose  there  exist 
convex  functions  Gk,k  =  1, . . . ,  K,  such  that 

AtiGk  <  F,  (3.2) 


and  for  each  k, 


inf  [L(/3)  +  Gkm  >  inf  [L(/3)  +  F{f3)]  . 


(3.3) 


If  each  Gk  is  bounded  from  below  and  lower  semicontinuous  then  a  minimizer 
/3fc  of  L(/3)  +  Gk{P)  will  exist,  and  we  can  define  the  weak  sense  subsolution 

W{x,t)  =  Ak=iWk{x,t), 

where  {Wk,ak)  is  the  affine  subsolution/control  pair  with 

Wk{x,  t)  =  -2  {ak,x)  +  2[Gkm  +  («A:,  /?fc)]  -  2(1  -  t)H{ak). 

The  same  argument  as  in  Case  1  shows  that  the  terminal  condition  W{x,l)  < 
2F{x)  is  satisfied,  and  we  have 

IT(0,0)  =  Af=i[T(/3fc)  +  G{(3k)]  >  27. 

Actually,  the  equality  holds,  since  (3.2)  implies  (3.3)  must  hold  as  an  equality 
for  some  k. 


3.3  Mollification 

As  discussed  previously,  once  a  weak  sense  subsolution  is  identified  as  the 
pointwise  minimum  of  a  collection  of  affine  subsolution/control  pairs,  mol¬ 
lification  is  used  to  produce  generalized  subsolution/controls. 

Let  {Wk,  Ofe)  G  A,  A:  =  1,  2  . . . ,  AT,  be  affine  subsolution/control  pairs.  We 
use  a  standard  numerical  approximation  which  we  call  exponential  weighting 
for  W{x,t)  =  /\^^iWk{x,t).  Let  5  be  a  small  positive  number,  and  define 

/  K 

W\x,t)  =  -<51og 

Vfc=i 
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For  1  <  i  <  iF,  let 


Then  one  can  easily  verify 

K 

DW\x,t)  =  ^pl{x,t)DWk{x,t) 

k=l 

and 

K 

Wf{x,  t)  =  Y^  pi{x,  t)  (iTfc)^  (x,  t), 

k=l 

and  so  takes  the  form  prescribed  for  a  generalized  subsolution/control 
with  dfe(x,  t)  =  ak-  It  is  obvious  that  the  three  functions  Sk{x,  t)  =  DWk{x,  t)  = 
—2ak,  rk{x,t)  =  {Wk)^  {x,t)  =  2H{ak)  and  ak{x,t)  are  uniformly  bounded 
and  Lipschitz  continuous,  and  it  is  easy  to  check  that  the  same  is  true  with 
regard  to  p|(x,  f)  for  each  fixed  6  >  0.  Therefore,  {W^,pl,ak)  is  a  general¬ 
ized  subsolution/control. 

For  a  fixed  {x,t)  let  I  satisfy  W{x,t)  =  Wi{x,t).  Then 

g-jTTfe(x,t)  ^  ^-jWi{x,t) 

iov  k  =  I, K .  Therefore 

K  _ 

k=l 

which  implies 

Wi{x,t)  >  W^{x,t)  >  Wi{x,t)  —  6logK. 

Since  I  achieves  the  minimum,  for  all  (x,  t) 

W (x,  t)  >  IF^(x,  t)  >W (x,  t)  —  6 log  K. 

Thus  if  W  satisfies  a  given  terminal  condition  then  so  will  ,  though 
1T^(0, 0)  >  IT(0,0)  —  8\ogK  implies  there  may  be  a  small  reduction  of  the 
value  at  (0,0). 

It  is  important  to  observe  is  that,  as  equation  (2.5)  indicates,  the  sub¬ 
solution  itself  does  not  play  any  explicit  role  in  the  computation  of  the 
change  of  measure  and  the  algorithm  is  completely  determined  by  (/?^,Q;fc). 
However,  the  function  characterizes  the  performance  of  the  correspond¬ 
ing  importance  sampling  algorithm  through  results  such  as  Theorem  2.2. 
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Remark  3.2  Recall  that  mollification  will  possibly  results  in  a  small  re¬ 
duction  in  the  value  1R(0, 0).  The  inequality  <  W  suggests  one  could 
perhaps  satisfy  the  terminal  condition  by  considering  -t-  c  instead  of 
for  some  c  >  0,  which  would  reduce  the  loss  at  (0,0).  However,  this  will 
not  eliminate  the  gap  in  all  circumstances.  It  is  worth  noting  that  one  can 
construct  a  sequence  of  schemes  indexed  by  n  which  achieves  the  theoretical 
bound  on  performance  if  one  chooses  5  0  as  n  ^  oo  in  an  appropriate 

way.  We  will  not  pursue  this  issue  here,  since  our  computational  experience 
suggests  it  is  not  needed  to  obtain  good  performance.  However,  the  inter¬ 
ested  reader  can  consult  [3]  for  the  precise  statement  and  further  details  in 
the  context  of  stochastic  networks. 


Remark  3.3  Exponential  weighting  is  not  the  only  way  to  achieve  mollifi¬ 
cation.  For  example,  one  can  mollify  W{x,t)  =  by  integration 

against  a  smooth  convolution  kernel,  for  which  a  standard  choice  is 


rj{x) 


Cexpll/dlrcp  —  1)},  if  ||x||  <  1, 

0  ,  if  ||x||  >  1, 


where  C  is  the  normalizing  constant  so  that  the  integral  of  rj  over  is  one 
[7,  Section  7.2].  However,  we  do  not  recommend  this  method  since  in  this 
case  the  weights  t)}  involve  integrations  that  can  be  computationally 

demanding.  In  contrast,  the  weights  are  very  easy  to  compute  when  using 
the  exponential  weighting  mollification.  In  fact,  for  all  the  numerical  exam¬ 
ples  in  Section  4  where  mollification  is  needed,  if  one  mollifies  by  integration 
against  the  convolution  kernel  rj,  the  resulting  importance  sampling  schemes 
will  yield  very  similar  estimates  and  standard  errors  (for  the  same  sample 
size)  as  those  based  on  the  exponential  weighting  mollification,  even  though 
the  latter  is  much  faster. 


3.4  Discussion 

The  construction  of  generalized  subsolution/controls  can  be  extended  in 
many  ways  and  to  many  other  situations.  Depending  on  the  problem  at 
hand,  the  Isaacs  equation,  the  terminal  condition,  and  the  boundary  con¬ 
ditions  (if  any)  may  take  different  forms  (see  Section  2.5).  For  example, 
when  computing  escape  probabilities  with  a  stochastic  network  one  is  par¬ 
ticularly  interested  in  combining  subsolutions  so  as  to  satisfy  appropriate 
boundary  conditions  [3].  In  other  problems,  one  way  wish  to  expand  A  so 
that  it  also  includes  {W ,  a)  where  IF  is  a  strict  subsolution  to  the  Isaacs 
equation,  or  even  a  non-afhne  subsolution  of  some  specific  form.  However, 
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the  basic  structure  for  constructing  an  importance  sampling  scheme  remains 
the  same:  We  identify  a  class  of  subsolution/control  pairs  which  correspond 
to  changes  of  measures  of  simple  form,  and  use  these  pairs  as  the  building 
blocks  for  generalized  subsolution/controls. 


4  Examples  of  importance  sampling  algorithms 

In  this  section  we  give  examples  of  importance  sampling  algorithms  based  on 
subsolutions.  As  noted  in  the  Introduction,  some  of  these  examples  are  not 
covered  by  the  theoretical  framework  in  the  companion  paper  [6].  The  role 
of  these  examples  is  to  demonstrate  the  broad  applicability  of  subsolutions 
in  importance  sampling.  Unless  specified  otherwise,  the  importance  sam¬ 
pling  algorithm  based  on  a  generalized  subsolution/control  will  follow  the 
description  in  Section  2.3  with  the  new  distribution  determined  by  equation 
(2.5). 


4.1  Example:  Estimating  P{Xn  G  A}  for  convex  A 

Assume  that  {Yi,  12) . . .}  is  a  sequence  of  iid  2-dimensional  N{Q,  I2)  random 
variables,  where  I2  is  the  2x2  identity  matrix.  Let 


= 


n 


i=l 


and  consider  the  estimation  of  P{Xn  G  A}  for  a  convex  set  A  of  the  form 

A  =  {x  G  :  (x  -  o)^  -f  2/^  <  r^}  , 

with  0  <  r  <  a.  Clearly  {Xn  G  A}  is  a  rare  event  for  large  n. 

For  this  model,  H{a)  =  ||ap/2  and  L{f3)  =  ||/9|p/2.  Cramp’s  Theorem 
yields 

lim  -logP{A„  G  A}  =  -  inf  L{/3)  =  -  ^(a  -  r)^  =  -7, 
n^oo  n  I3&A  2 


with  the  minimizing  (3*  =  {a  —  r,  0).  The  convex  conjugate  of  (3*  is  just 
a*  =  {a  —  r,  0). 

As  discussed  in  Section  3.2.1,  the  affine  subsolution/control  pair  (IT,  a*) 
W{x,  t)  =  -2{a*,x)  +  2{/3*,a*)  -  2(1  -  t)H{a*) 
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satisfies  the  terminal  condition  (2.7)  and  1^(0,  0)  =  2L{(5*)  =  27.  By  Theo¬ 
rem  2.2  the  importance  sampling  algorithm  based  on  (IT,  a*)  is  asymptoti¬ 
cally  optimal. 

In  this  example,  (IT,  a*)  induces  an  importance  sampling  scheme  that 
takes  the  simplest  possible  form,  and  indeed  one  that  is  well  known  in  the  im¬ 
portance  sampling  literature.  Let  ix  be  the  underlying  distribution  T(0,l2). 
Then  the  {T}  are  iid  with  distribution 

which  is  the  distribution  of  N{a* ,  12). 

The  table  below  gives  numerical  results.  We  take  a  =  2,  r  =  1,  and  run 
simulations  for  the  cases  n  =  25,  50, 100.  Each  estimate  consists  of  20,000 
replications.  The  theoretical  value  (which  is  available  from  the  standard 
statistics  software  S-plus)  is  presented  for  comparison.  The  standard  error  is 
also  a  numerical  estimate,  and  C.I.  stands  for  “confidence  interval,”  though 
this  is  only  formal  since  we  make  no  assertion  regarding  normality  of  errors. 


n  =  25 

n  =  50 

n  =  100 

Theoretical  value 

1.99  X  10"^ 

5.39  X  10-^'^ 

5.36  X  10-^-^ 

Estimate 

2.00  X  10"^ 

5.48  X  10“^^ 

5.44  X  lO"^"^ 

Standard  Error 

0.04  X  10"^ 

0.12  X  10-^^ 

0.14  X  10-^-^ 

95%  C.I. 

[1.92,2.08]  X  10“'' 

[5.24,5.72]  X  10"^^ 

[5.16,5.72]  X  10-^-^ 

Table  1.  Estimating  G  A}  for  convex  A. 

Remark  4.1  This  algorithm  based  on  (IT,  a*)  coincides  with  the  impor¬ 
tance  sampling  based  on  what  one  might  call  the  “standard  heuristic,”  which 
states  that  the  change  of  measure  used  in  the  analysis  of  the  large  deviation 
lower  bound  is  a  good  choice  for  importance  sampling.  As  demonstrated  in 
[9,  10,  4,  5],  the  standard  heuristic  importance  sampling  is  efficient  only  in 
very  special  situations. 

4.2  Example:  Estimating  P{Xn  G  A}  for  non-convex  A 

In  this  section,  we  give  numerical  estimates  of  P{Xn  G  A}  when  A  C 
takes  the  form 

Ac  {x  :  {x,  ai)  >  (/3i,  ai)}  U  {x  :  {x,  02)  >  {P2,  a2)}  ,  (4.1) 

with  ak  G  and  /3fc  G  convex  conjugates  for  A:  =  1,  2. 
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As  discussed  in  Section  3.2.1,  one  can  construct  affine  subsolution/control 
pairs  {Wk,Cik)  with 

Wk{x,t)  =  -2{ak,x)  +  2{ak,Pk)  -  2(1  -  t)H{ak) 
such  that  their  minimum 

W{x,t)  =  Wi{x,t)  A  W2{x,t) 

is  a  (weak  sense)  subsolution  which  satisfies  the  terminal  condition  (2.7). 

Using  the  exponential  weighting  mollification  scheme  presented  in  Sec¬ 
tion  3.3,  for  any  choice  of  8  one  produces  a  generalized  subsolution/control 
(lU^,  pI,  ttfc).  The  importance  sampling  algorithm  is  determined  by  (p|,  a^). 

We  will  present  two  numerical  examples:  one  for  one-dimensional  iid 
normal  random  variables,  the  other  for  a  two-dimensional  finite  state  Markov 
chain.  Both  examples  have  already  appeared  [4,  5].  The  difference  is  that  in 
these  papers  the  importance  sampling  algorithm  was  based  on  the  solution 
of  the  Isaacs  equation,  and  whence  the  state  dependence  of  the  change  of 
measure  was  much  more  involved. 


4.2.1  IID  normal  random  variables 


Assume  that  {Yi,  I2, . . .}  is  a  sequence  of  iid  N{0, 1)  random  variables,  and 


Suppose  we  are  interested  in  estimating  P{Xn  G  A}  for  the  non-convex  set 


A  =  (—00,  o]  U  [b,  00) 


with  a  <  0  <  6.  One  can  write  A  in  the  form  of  (4.1)  by  taking  ai  =  Pi  =  a 
and  02  =  P2  =  b. 

The  importance  sampling  algorithm  based  on  a  generalized  subsolu¬ 
tion/control  (W^,  pI,  Ok)  is  as  follows.  We  simulate  {Y,  Y2,  •  •  •}  recursively. 
Let  Aq  =  0  and 


1 

n 


EA 


By  (2.5),  the  conditional  distribution  of  given  Xj  =  x  is 


k=l 


19 


For  numerical  experimentation,  we  take  a  =  —0.25,  b  =  0.2,  and  run 
simulations  for  n  =  100,  200,  500.  The  mollification  parameter  6  is  set  as 
0.02.  Each  estimate  consists  of  20,000  simulations. 


n  =  100 

n  =  200 

n  =  500 

Theoretical  value 

2.90  X  10-^ 

2.54  X  10-^ 

3.88  X  10-'’ 

Estimate 

2.87  X  10“^ 

2.50  X  10“^ 

3.92  X  10“'’ 

Standard  Error 

0.03  X  10“^ 

0.04  X  10“^ 

0.08  X  10“'’ 

95%  C.I. 

[2.81,2.93]  X  10"^ 

[2.42,2.58]  X  10-‘^ 

[3.76,4.08]  X  10-'’ 

Table  2.  P{Xn  G  A}  with  one-dimensional,  non-convex  A. 


4.2.2  A  finite  state  Markov  chain 

Consider  a  two- node  tandem  Jackson  network  with  arrival  rate  A  and  con¬ 
secutive  rates  We  assume  the  queueing  system  is  stable,  that  is, 

X  <  Hi  A  iJ,2-  The  system  has  finite  buffers  of  size  Bi  and  B2,  respectively. 

The  embedded  time-homogeneous  discrete-time  Markov  chain  is  T  = 
{Yi  =  (YX^Y^),  i  G  No},  representing  the  queue  lengths  of  the  nodes  at  the 
epochs  of  transitions  in  the  network.  This  process  has  the  finite  state  space 
^  =  {(2/1)2/2)  :  Hi  =  0, 1, . . . ,  i  =  1,2}.  It  is  assumed  that  the  system 
is  initially  empty,  i.e.,  Yq  =  =  (0,0).  The  transition  probability 

matrix  of  Y  is  denoted  by  P. 

We  are  interested  in  estimating  a  class  of  probabilities  associated  with 
buffer  overflow.  More  precisely,  define  g  :  S  {0,1}^  by 


din)  (^{yi=Bi})  ^{j/2=B2}) 
for  every  y  =  {yi,y2)  e  S.  Let 


-  n— 1 


i=0 


We  wish  to  estimate  P{Xn  G  A},  where  A  takes  the  form 


A  =  {{xi,X2)  ■  xi>  Si  or  X2  >  £2} 


for  some  0  <  £1,  £2  <  1- 

For  each  A:  =  1,2,  let  Pk  G  be  the  minimizer  of  L{P)  over  the  half 
space 

Hk  =  {{xi,X2)  :  Xk  >  £fc}. 
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If  ttfc  £  is  the  convex  conjugate  of  Pk,  then  we  can  write  A  in  the  form 
of  (4.1),  that  is, 


A  =  {x  :  (x,  au)  >  {Pk,  ak)}  ■ 

The  importance  sampling  algorithm  based  on  a  generalized  subsolu¬ 
tion/control  {W^,  pI,  ak)  is  as  follows.  We  simulate  {Yq,  Ti, . . .}  recursively, 
with  initial  state  Yq  =  (0,0).  Let 


Thanks  to  (2.5),  the  conditional  distribution  of  L)+i,  given  Xj  =  x  and 
Yj  =  y,  is  the  mixture 

2 

'^Pkix,j/n)Pkiy,-), 

k=l 

where  Pk  is  a  transition  probability  matrix  defined  by  (2.1)  and  (2.2); 

Pk{y,z)  =  y,zeS. 

r[y,ak) 

In  the  numerical  simulation  we  take  Bi  =  B2  =  6,  and  A  =  0.2,  pi  = 
P2  =  0.4,  and  set  ei  =  0.3,  62  =  0.4.  The  mollification  parameter  is  chosen 
as  6  =  0.1.  We  run  simulations  for  n  =  50,80,110,  and  each  estimate 
consists  of  20,000  replications.  The  theoretical  values  are  obtained  using  a 
recursive  algorithm  and  presented  for  comparison. 


n  =  50 

n  =  80 

n  =  no 

Theoretical  pn 

5.15  X  10-« 

3.47  X  10-^^ 

1.83  X  10-^^ 

Estimate 

5.05  X  10-y 

3.39  X  10"^^ 

1.87  X  10"^^ 

Standard  Error 

0.21  X  10"^ 

0.13  X  10~^^ 

0.08  X  lO”^'’ 

95%  C.I. 

[4.63,5.47]  X  10-« 

[3.13,3.65]  X  10-^^ 

[1.71,2.03]  X  lO-^'’ 

Table  3.  P{X„  G  A}  with  two-dimensional,  non-convex  A. 


4.3  Example:  Estimating  an  expectation  E'exp{—nF(X„)} 

Although  the  estimation  of  E  exp{—nF{Xn)}  is  a  generalization  of  the  prob¬ 
lem  of  estimating  probabilities,  as  we  saw  in  Section  3.2.2,  the  principle  of 
constructing  generalized  subsolution/control  is  unchanged.  Below  we  give  a 
simple  numerical  experiment  for  illustrative  purposes. 
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Consider  a  sequence  of  iid  iV(0, 1)  random  variables  {Yi,Y2, . . .}  and  let 


Xn 


1 

n 


We  wish  to  estimate  i?exp{— nF(X„)}  where  F  is  defined  as  follows: 


F{x)  =  { 


a{x  +  1)  +  1 

1 

—b{x  —  1)  +  1 


if  X  <  —(1  + 
if  — (1  +  o”^)  <  X  <  — 1, 
if  — 1  <  X  <  1, 
if  1  <  X  <  1  +  b~^, 
if  X  >  1  +  b~^, 


and  where  a,  b  are  positive  constants.  Under  these  assumptions,  we  have 
H{a)  =  o?  12  and  U(/3)  =  /3^/2,  and 

lim  —  logUexp{— nF(X„)}  =  —  inf  [F(/3)  +  L(/3)]  =  —7. 

n^oo  n  /teIR 


Note  that  F  can  be  written  as  Gi  A  G2  A  G3,  with 


Gi(x)  =  (ax  +  o  +  1)^,  G2(x)  =  1,  G3(x)  =  (bx  —  b  —  1) 


all  convex  functions.  For  each  A:  =  1,  2,  3,  let  f3k  be  the  minimizer  of  L(f3)  + 
Gk{P)  over  /3  G  and  ak  the  convex  conjugate  point  of  (3k,  whence  ak  = 
(3k-  Then  equations  (3.2)  and  (3.3)  hold,  and  one  can  follow  the  general 
construction  detailed  in  Section  3.2.2. 

Let  (Wk,ak)  be  the  subsolution/control  pair 

Wk(x,  t)  =  -2  {ak,x)  +  2[Gk((3k)  +  {oik,Pk)]  -  2(1  -  t)H(ak). 

Their  minimum  W  =  lUi  A  1^2  A  IL3  is  a  (weak  sense)  subsolution  that 
satisfies  the  terminal  condition  lU(x,  1)  <  2F(x)  [Remark  2.4].  Furthermore, 

VF(0,  0)  =  2  inf  [Gk{(3k)  +  L((3k)]  =  2  inf  [F(/3)  +  L((3)]  =  27, 

k  p 

the  optimal  decay  rate. 

A  generalized  subsolution/control  (W^ ,  a^)  obtained  as  in  Section  3.3 

induces  the  following  importance  sampling  scheme.  Let  Xq  =  0,  and 


1 

n 
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The  sequence  {Yi,y2,  •  •  •}  is  simulated  recursively  so  that  the  conditional 
distribution  of  d^+i,  given  Xj  =  x,  is  the  mixture  of  normal  distributions 

V  ^vr 

In  the  numerical  simulation,  we  take  a  =  3/2,  6  =  4,  and  it  is  easy 
to  check  that  ai  =  /3i  =  —3/2,  a2  =  P2  =  0,  and  03  =  /^s  =  5/4.  The 
mollification  parameter  6  is  set  to  0.1.  We  run  simulations  for  n  =  10,  20,  30, 
with  20,000  simulations  for  each  estimate. 


n  =  10 

n  =  20 

0 

CO 

II 

Theoretical  value 

1.03  X  10-* 

1.87  X  10““ 

5.63  X  10-"^ 

Estimate 

1.02  X  10-* 

1.86  X  10“® 

5.73  X  10-^^ 

Standard  Error 

0.01  X  10-* 

0.03  X  10-» 

0.09  X  10-^^ 

95%  C.I. 

[1.00, 1.04]  X  10-* 

[1.80, 1.92]  X  lO"'* 

[5.58,5.91]  X  10-^^ 

Table  4.  E  exp{—nF{Xn)}  for  one-dimensional,  non-convex  F. 


4.4  Example:  Level  crossing 

In  this  section  we  consider  importance  sampling  estimates  for  level  crossing 
probabilities.  To  illustrate  the  main  idea,  we  specialize  to  the  following 
setup.  Let  {Yi,  Y2, . . .}  be  a  sequence  of  iid  random  vectors  taking  values  in 
with  common  distribution  fj,,  and  A  a  Borel  set.  Define  the  partial 
sum 

n 

Sn  =  ^Yi 
i=l 

with  S'o  =  0,  and  for  every  real  number  z  >  0  let 

Tz  =  inf  >  0  :  -Sn  G  • 

Under  certain  conditions,  {Tz  <  00}  is  a  rare  event  for  large  2;  and  its 
probability  will  decay  exponentially  in  the  sense  that 

lim  -  log  P{Tz  <  00}  =  —7 

Z—^OO  z 

for  some  7  >  0.  The  simplest  example  is  when  {!/}  are  iid,  one  dimensional 
random  variables  with  a  negative  expectation  and  A  =  (l,oo).  Naturally, 
the  question  is  how  to  estimate  P{Tz  <  00}  for  large  2;. 
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The  theoretical  framework  presented  in  [6]  does  not  cover  this  case  -  for 
example  the  time  horizon  is  now  infinite  (see  Remark  4.2  for  more  informa¬ 
tion).  However,  the  use  of  subsolutions  still  carries  over  and  leads  to  simple 
and  efficient  importance  sampling  algorithms,  and  we  will  outline  their  use 
in  the  next  few  paragraphs. 

Let  H  be  the  log- moment  generating  function  for  Yi,  and  let  L  be  the 
Legendre  transform  of  H.  It  is  not  hard  to  argue  that  the  Isaacs  equation 
associated  with  level  crossing  problems  is  of  the  same  form  as  (2.3),  except 
that  there  is  no  time  dependence.  In  other  words,  the  Isaacs  equation  is 

sup  inf  BI(DVL;  a, /3)  =  0  (4.2) 

and  with  boundary  condition  W{x)  =  for  x  ^  A.  Here  W  :  — >  M  and 

DW  is  its  gradient,  and  as  before 

BI(s;  a,  (5)  =  (s,  (5)  +  L{(5)  +  (a,  P)  -  H (a) 

for  s,a,P  &  A  generalized  subsolution/control  {W,pk,ak)  is  defined  in 
a  completely  analogous  fashion  to  Definition  2.1. 

For  a  given  generalized  subsolution/control  iW ,  pk,  ak),  the  correspond¬ 
ing  importance  sampling  algorithm  is  as  follows.  Fix  the  parameter  z.  Let 
Xq  =  0.  Given  Xj  =  x,  we  simulate  under  the  distribution 

K 

'^Pk{x)P{dy;ak{x)), 

k=l 

where  P{dy;  a)  is  the  exponential  twist 

P{dy,a)  = 


and  with  p,  the  distribution  of  Yi.  Finally,  we  update  the  dynamics  and  let 


Xj+i  -  Xj  +  -Yj+i- 


The  simulation  will  be  stopped  at  time 

Tz  =  inf  |n  >  0  :  Xn  £  =  inf  |n  >  0  :  -Sn  G  A 

and  one  forms  the  importance  sampling  estimator 

K  _  _  _  ■ 


=  1 


Y-i 

{T.<oo}  •  n 

3=0 


.k=l 


-1 
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The  performance  of  this  importance  sampling  algorithm  can  be  char¬ 
acterized  by  a  result  analogous  to  Theorem  2.2,  except  that  the  terminal 
condition  (2.7)  should  be  replaced  by  the  boundary  condition 

W{x)  <  0  (4.3) 

for  all  X  ^  A.  Therefore,  the  goal  is  to  construct  a  generalized  subsolu¬ 
tion/control  iW ,  pk,ak)  of  a  simple  form  that  satisfies  the  boundary  condi¬ 
tion  (4.3)  and  such  that  1T(0)  is  as  close  as  possible  to  the  optimal  decay 
rate  27. 

The  construction  follows  the  same  path,  that  is,  one  first  identifies  a 
class  of  affine  subsolution/control  pairs  and  then  builds  a  generalized  subso¬ 
lution/control  by  mollifying  the  minimum  of  such  affine  subsolution/control 
pairs  as  was  done  Section  3.3.  The  class  of  affine  subsolution/control  pairs 
that  serve  as  building  block,  again  denoted  by  A,  takes  a  different  form  in 
this  setting: 

A  =  |(lT,d)  :  Wix)  =  -2(a,x}+c,  a  G  R^,H{a)  <  0,  c  G  m}  . 

It  is  not  difficult  to  check  that  every  {W,  d)  G  A  is  indeed  is  subsolu¬ 
tion/control  pair,  since 

inf  m{DW-,a,f3) 


> 

Analogous  to  the  discussion  in  Section  3.2,  suppose,  for  example,  that 
there  exists  AT  G  N  so  that 

A  C  {x  :  {x,  oik)  >  Ck} 

for  some  Cfc  G  M  and  Ok  G  such  that  H{ak)  <  0.  Then  for  each  k,  one 
can  define  an  affine  subsolution/control  pair  {Wk,ak)  G  A  with 

Wk{x)  =  -2{ak,x)  +  2ck. 

The  minimum  of  {114}  is  a  weak  sense  subsolution  to  the  Isaacs  equation 
(4.2)  and  it  satisfies  the  boundary  condition  (4.3).  One  then  mollifies  in 
order  to  obtain  a  generalized  subsolution/control  (IT'^,  dfc). 


inf  [(-2a,  4) -I- L(/3) -f  (a,  4)  -  i7(a)] 
/3eM‘^ 

inf  [{—a,j3)  +  L{j3)]-H{6i) 

/3eM‘^ 

-2H{a) 

0. 
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Remark  4.2  In  the  theoretical  analysis  one  needs  to  “bonnd”  the  infinite 
time  horizon  in  a  certain  way  -  essentially  to  jnstify  an  approximation  by 
a  finite  time  problem  -  and  then  apply  a  verification  argument  similar  the 
one  used  in  [6] .  More  details  can  be  found  in  [3] ,  where  analysis  of  this  type 
is  carried  out  for  the  problem  of  estimating  buffer  overflow  probabilities  in 
queueing  networks. 

We  next  present  two  examples.  The  first  example  is  a  one  dimensional 
level  crossing  problem,  which  has  been  studied  extensively  [12,  11,  1],  and  we 
will  see  how  the  subsolution  approach  leads  to  the  commonly  used  change 
of  measure.  The  second  example  was  studied  in  [10],  where  it  was  used  as 
a  counterexample  to  illustrate  the  danger  of  blindly  following  the  standard 
heuristic  approach  to  importance  sampling. 

For  each  example  the  numerical  experiment  considers  exponential  ran¬ 
dom  variables.  There  are  two  reasons  for  choosing  the  exponential  distribu¬ 
tion.  One  is  that  an  assumption  we  used  very  often  to  facilitate  the  analysis 
(e.g.,  in  [4,  5,  6])  is  that  the  log  moment  generating  function  H  is  finite 
everywhere.  This  is  not  true  for  exponential  distributions,  and  as  we  will 
see  in  fact  it  is  not  necessary.  The  second  reason  is  that  for  exponential  dis¬ 
tributions  the  level  crossing  probabilities  can  be  explicitly  computed,  and 
these  theoretical  values  can  be  used  for  comparison  to  indicate  the  accuracy 
of  the  importance  sampling  estimates. 

4.4.1  One  dimensional  level  crossing 

Assume  that  {Yi,  12)  •  •  •}  are  iid  random  variables  with  common  distribution 
IJ.  and  that  E[Yi]  <  0.  Let  5^  =  Yi  -|-  •  •  •  -|-  be  the  partial  sum.  Let 
A  =  (1,  oo)  and 

Tz  =  inf  {n  >  0  :  5n  G  zA}  =  inf  {n  >  0  :  S'n  >  2;}  . 

Let  H  be  the  log  moment  generating  function 

H{a)  =  log  f  e'^^nidy), 

Jr 

and  define  a  to  be  the  unique  positive  solution  to  H (a)  =  0.  It  is  well  known 
that,  under  mild  conditions, 

lim  -logPfT^  <  oo}  =  —a, 

Z—^OO  z 
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or,  7  =  0.  It  obvious  that 


A  C  {x  :  X  ■  a  >  a}, 

and  thus 

W  (x)  =  —2ax  +  2a, 

and  a  are  a  subsolution/control  pair  which  also  satisfies  the  boundary  con¬ 
dition  VF(x)  <  0  for  X  €  A.  Note  that  W{0)  =  2a  =  27,  whence  the  cor¬ 
responding  importance  sampling  algorithm  is  asymptotically  optimal.  This 
subsolution/control  pair  induces  a  change  of  measure  that  coincides  with 
the  classical  choice,  that  is,  the  algorithm  simulates  iid  {Yi}  with  common 
distribution 

u{dy)  = 

For  numerical  experimentation,  we  consider  the  special  case  where  for 
some  constant  9  >  Q,  Yi  +  9  \s  exponentially  distributed  with  parameter  A. 
The  assumption  of  E\Yi]  <  0  is  equivalent  to  0A  >  1.  A  bit  of  algebra  yields 
that  a  is  the  unique  positive  root  to  the  equation 

0  =  i?(Q!)  = —q;0 -|- log  A  —  log(A  —  a),  (4-4) 

and  that  the  simulation  distribution  is 

v{dy)  =  (A  - 

Thus  the  {Y  +  G}  are  iid  exponentially  distributed  with  parameter  A  —  d.  It 
is  not  difficult  to  show  that  EYi  >  0,  and  thus  is  finite  with  probability 
one. 

Below  is  a  numerical  result.  We  take  X  =  1,  9  =  2,  and  run  simulations 
for  m  =  10,  20, 30.  Each  estimate  uses  20,000  simulations.  The  theoretical 
values  are  obtained  by  explicitly  solving  an  associated  integral  equation  (we 
omit  the  details).  Indeed, 

P{Y  <  00}  =  (4.5) 

The  value  of  a  is  obtained  by  numerically  solving  equation  (4.4)  using  the 
bisection  method,  and  a  ^  0.80. 


m  =  10 

m  =  20 

m  =  30 

Theoretical  value 

7.04  X  10"^ 

2.44  X  lO-'* 

8.44  X  10“^^ 

Estimate 

7.11  X  10-*’ 

2.44  X  10-» 

8.30  X  10-^^ 

Standard  Error 

0.07  X  10-^ 

0.02  X  lO-*^ 

0.08  X  10“^^ 

95%  C.I. 

[6.97,7.28]  X  10-=* 

[2.40,2.48]  X  lO"'* 

[8.14,8.46]  X  10“-^^ 

Table  5.  Estimating  probability  of  level  crossing  for  1-dim  random  walk. 
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4.4.2  Two  dimensional  level  crossing 


Let  {Yn  =  {Y^,Y^),n  G  N}  be  a  sequence  of  iid  random  vectors  with 
E[Y^]  <  0,  E[Y^]  <  0,  and  common  distribution  (i.  As  before,  denote 
the  partial  sum  by 

n 

Sn  =  {slsl)  =  Y,y^■ 

i=l 


Let 


A  =  {x  =  (xi,  X2)  :  xi  >  1  or  X2  >  1}  . 


It  follows  that 


Tz  =  inf  {n  >  0  :  G  zA}  =  inf  >  0  :  >  z  or  >  z}  . 

Let 

H{a)  =  log  f  fj,{dy) 

JM2 

and  let  di  =  (71,0)  and  0:2  =  (0)72)  where  7^  >  0  is  the  unique  positive 
number  such  that  El  {oik)  =  0,  A;  =  1,  2.  Under  mild  conditions,  we  have 

lim  -  log  P{Tz  <  00}  =  -7  =  -71  A  72. 

2^00  2^ 

We  have 

A  C  {x  :  {x,  di)  >  71}  U  {x  :  {x,  012)  >  72}  • 

For  each  A;  =  1,2,  an  affine  subsolution/control  pair  is  {Wk,ak)  with 

Wk{x)  =  -2{ak,x)  +  27fc. 


If  lU  =  lUi  A  IU2,  then  lU  is  a  subsolution  in  a  weak  sense  that  satisfies 
the  boundary  condition  (4.3).  Also  note  that  IU(0)  =  2(71  A  72)  =  27,  the 
optimal  decay  rate.  A  generalized  subsolution/control  (IU'^,p|,dfc)  is  then 
be  obtained  by  mollification,  and  the  corresponding  importance  sampling 
algorithm  is  as  follows.  Let  Sn  =  Yi  +  ■  ■  ■  +  Yn-  We  recursively  simulate 
{Li,  y2)  •  •  •}  such  that  given  Sj/z  =  x,  the  conditional  distribution  of  I^+i 
is  the  mixture 

2 

^pUx)P{dy;ak), 

k=l 


where 

P{dy;ak)  =  y{dy). 
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We  stop  the  simulation  once  the  process  {Sn/z}  reaches  the  set  A. 

For  the  purpose  of  numerical  experimentation,  we  consider  the  special 
case  where  for  some  constants  9k  >  0,  the  distribution  of  +  0k  is  ex¬ 
ponential  with  parameter  A^,  k  =  1,2.  Assnme  9k^k  >  1)  or  equivalently 
E[Yi]  <  0,  for  every  A:  =  1,2.  We  also  assnme  that  {Y^}  and  {Y^^}  are  in¬ 
dependent  sequences.  Under  these  conditions,  for  each  k,  'fk  >  0  is  nniqnely 
determined  and  satisfies  the  equation 

0  =  -7fc6'fc  +  log  Afc  -  log(Afc  -  7fc),  (4.6) 


and 

P{dy,  di)  =  (Ai  - 

P{dy,  d2)  =  •  (As  -  72)e-(^2-72)(j/2+e2)^y2^ 

Below  is  a  numerical  result.  We  take  Ai  =  As  =  1,  and  6i  =  2,  62  =  3. 
We  rnn  simulations  for  m  =  10,  20, 30,  and  each  estimate  consists  of  20,000 
simulations.  The  theoretical  values  can  be  easily  obtained  from  the  one¬ 
dimensional  formula  (4.5).  Again,  the  value  of  7^  is  obtained  by  numeri¬ 
cally  solving  equation  (4.6)  using  the  bisection  method,  with  71  0.80  and 

72  ~  0.86.  It  is  worth  pointing  ont  that  the  importance  sampling  estimate 
suggested  by  the  standard  heuristic,  which  is  to  simulate  iid  {Y)}  with  dis¬ 
tribution  P{dy,  di),  will  have  unbonnded  variance  as  2:  tends  to  infinity  [10, 
Theorem  2(i)].  The  mollification  parameter  6  is  taken  as  0.1. 


m  =  10 

m  =  20 

II 

CO 

0 

Theoretical  value 

9.51  X  10-^ 

2.88  X  10-» 

9.24  X  10-"-^ 

Estimate 

9.56  X  10-^ 

2.87  X  lO-'* 

9.31  X  10"^^ 

Standard  Error 

0.10  X  10“^ 

0.03  X  10-« 

0.09  X  10“^^ 

95%  C.I. 

[9.36,9.76]  X  10-^ 

[2.81,2.93]  X  lO"'* 

[9.13,9.49]  X  10“^^ 

Table  6.  Probability  of  level  crossing  for  2-dim  random  walk. 


4.5  Example:  A  path-dependent  event 

Let  {Yi,  12)  •  •  •}  be  a  sequence  of  iid  random  variables  with  common  distri¬ 
bution  n  and  E\Yi\  =  0.  As  before,  let  PI  be  the  log-moment  generating 
function  and  L  its  convex  conjugate.  Fix  n  G  N,  and  for  1  <  i  <  n  define 


Ai  = 


1  * 

n 


i=i 
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with  Xq  =  0.  We  are  interested  in  estimating 


En  —  E 


^-nF{X„) 


1 


{inaxQ<  j< 


Xi>h} 


where  /i  >  0  is  a  given  constant.  Let  AC[0, 1]  denote  the  collection  of  all 
absolutely  continuous  functions  on  [0,1].  Assume  that  the  large  deviation 
limit  ^ 

lim  -  log  En  =  -7 

n^oo  fi 

holds,  with  7  the  solution  of  the  following  variational  problem: 

7  =  inf  /  f  dt  +  F(0(1))  :  </>  G  AC[0, 1],  max  >  h,  (^(0)  =  ol  . 

[Jo  o<t<i  j 

(4.7) 

To  write  down  the  Isaacs  equation  associated  with  this  estimation  prob¬ 
lem,  we  need  to  expand  the  state  space  to  accommodate  the  path-dependence 
of  the  event.  More  precisely,  the  state  process  will  be  (Aj,  Bi),  where 


is  the  indicator  of  whether  or  not  the  “barrier”  h  has  been  breached  by 
step  i.  This  problem  can  then  be  thought  of  as  a  combination  of  the  level 
crossing  problem  of  Section  4.4  and  the  finite  time  problem  of  Section  3.2. 
First  consider  the  problem  conditioned  on  =  1.  In  this  case  the  large 
deviation  problem  takes  exactly  the  same  form  as  in  Section  3.2,  and  the 
subsolutions  of  interest  are  characterized  by 

lTi(l,  X,  t)  sup  inf  M(DxW (1,  x,  t);  a,  /3)  >  0 

«eK/t6lL 

with  H  be  as  defined  in  (2.4),  together  with  the  terminal  condition 

W{l,x,t)  <2E{x).  (4.8) 

An  appropriate  subsolution  will  give  us  a  good  importance  sampling  scheme 
for  use  at  all  times  after  the  threshold  is  crossed.  The  question  then  is 
to  identify  the  importance  sampling  scheme  to  use  before  the  threshold  is 
crossed.  Let  us  suppose  that  as  soon  as  the  threshold  is  crossed  we  switch 
to  the  scheme  associated  with  W{l,x,t),  so  that  W{l,x,t)  identifies  an 
upper  bound  on  the  performance  after  this  time.  We  are  therefore  back  in 
the  setting  of  the  level  crossing  problem,  though  there  is  now  an  exit  east 
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W{1,  h,t)  depending  on  the  (scaled)  time  that  the  barrier  is  crossed.  Hence 
the  subsolution  for  times  prior  to  crossing  the  barrier  is  also 

Wt(0,  X,  t)  +  sup  inf  MiDxW (0,  x,  t);  a,  /3)  >  0 
«eK/3ei; 

(where  the  time  derivative  is  needed  because  the  exit  cost  depends  on  time) , 
together  with  the  boundary  condition 

W{0,x,t)  <W{l,x,t)  (4.9) 


for  X  >  1,  t  G  [0, 1]. 

Generalized  subsolution/controls  to  these  equations  are  defined  as  in 
Sections  4.4  and  3.2,  and  the  corresponding  importance  sampling  algorithm 
is  as  follows.  Let  Xq  =  0  and  Bq  =  0.  Given  Xj  =  x  and  Bj  =  b,  we 
simulate  Yj+i  under  the  distribution 

K 

'^Pk{b,x,j/n)P{dy,ak{b,x,j/n)) 

k=l 

where  P{dy]  a)  is  the  exponential  twist 

P{dy;a)  = 

Then  we  update  the  dynamics  by 

^j+l  —  ~  l{maxo<i<j+i 

The  performance  of  this  importance  sampling  algorithm  can  be  charac¬ 
terized  by  an  analogous  result  to  Theorem  2.2.  In  particular,  if  is  the 
second  moment  of  a  single  sample  of  the  importance  sampling  estimator 
corresponding  to  W,  then 

lim  inf  -  -  log  >  IT(0, 0, 0) . 

n—^oo  fl 

Therefore,  the  goal  is  to  find  a  structurally  simple  generalized  subsolu¬ 
tion/control  {W,pk,C(k)  satisfying  (4.9)  and  (4.8)  with  the  value  oflT(0,0,0) 
as  large  as  possible,  preferably  equal  to  the  optimal  decay  rate  2'y. 

For  illustration,  we  will  consider  the  simple  setting  where 

En  =  T*  <  max  Xi  >  h,  X^  <  I 

[0<i<n 
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for  some  constant  0  <  I  <  h.  In  other  words,  F(a:)  =  oo  for  re  >  /  and  0 
otherwise.  As  in  the  cases  studied  previously,  a  subsolution  can  be  identified 
in  terms  of  the  solution  to  the  large  deviation  variational  problem.  Using 
convexity  and  Jensen’s  inequality,  this  problem  can  be  written  in  the  form 

inf  i^poL  +  piL  :  Pi  >  0,z  =  0,  l,po  +  Pi  =  l|  • 

Since  the  mean  of  p  is  zero,  A(/3)  =  0  if  and  only  if  /3  =  0,  and  thus 
the  infimum  is  achieved  at  p*  with  pt  >  0  for  f  =  1,  2.  Let  Pq  =  h/p^  and 
PI  =  {l  —  h)/p\,  and  let  and  be  the  convex  conjugates,  respectively.  We 
claim  that  H{aQ)  =  H{a\).  Indeed,  the  necessary  condition  for  a  minimizer 
gives 

L  (/?*)  -  L'  iP*)  P*  -  L  iPl)  +  L'  iPl)  PI  =  0 
Using  the  characterization 

=  iPt), 

we  have 

H{al)  =  L'  {PD  Pl-L  {P*o)  =  L'  {PD  PI  -  L  {PD  =  H{a\). 

Using  that  the  interpretation  of  W{l,x,t)  as  the  solution  to  the  finite 
time  problem  with  the  given  terminal  condition,  we  know  from  Section  3.2 
that  a  subsolution  is  given  by 

IU(I,  rc,  t)  =  -2a\x  +  2lal  -  2(1  -  t)H{a\). 

As  subsolution  for  the  times  prior  to  exceeding  h  we  use  the  form 

IU(0,  X,  t)  =  -2alx  +  Co  -  2(1  -  t)H{al). 

Since  H{aQ)  =  H{a\),  to  satisfy  the  boundary  condition  we  need  —2aQh  + 
Co  <  — 2aj/i  +  2Plal,  and  to  obtain  the  largest  value  for  IU(0,0, 0)  we  take 
Co  =  2(aQ  —  al)h  +  2la,  so  that  the  two  functions  agree  on  x  =  h.  It  follows 
that 

IU(0,0,0)  =  [W{0,0,0)-WiO,h,pl)]  +  [Wil,h,p*o)-W{l,l,l)] 
+W{ip,l) 

=  2alh  -  Po2H{a*o)  +  2al{l  -  h)  -  p\2H{a\) 

=  2pl  [alPl  -  2H{al)]  +  2p\  [a\Pl  -  2H{a\)] 

=  27. 
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In  other  words,  the  corresponding  scheme  is  asymptotically  optimal. 

For  a  numerical  example  we  take  Yi  ~  iV(0, 1).  The  corresponding  im¬ 
portance  sampling  algorithm  takes  a  very  simple  form.  Let  Xq  =  0  and 
Bq  =  0.  If  Bj  =  0,  that  is,  the  sample  path  maximum  has  not  yet  snrpassed 
barrier  h,  we  simulate  under  the  distribution  N{2h  —  1, 1).  If  Bj  =  1, 
that  is,  the  barrier  h  has  already  been  reached,  we  simulate  under  the 
distribution  N{1  —  2h,  1).  Then  we  update  the  dynamics  by 

^j+l  —  ^j+l  —  l{maxo<i<j+i 

In  the  numerical  experiment  below,  h  =  1  and  I  =  0.8.  Simulations  were 
run  for  n  =  10,  20, 30,  and  each  estimate  consists  of  20,000  samples.  What 
we  call  the  “theoretical  valne”  is  an  estimate  based  on  1  billion  samples  of 
the  importance  sampling  scheme. 


n  =  10 

n  =  20 

n  =  30 

Theoretical  value 

1.68  X  10"'' 

9.66  X  10"'^ 

6.09  X  10“^^ 

Estimate 

1.74  X  10“^ 

9.58  X  10-« 

6.26  X  10“^^ 

Standard  Error 

0.04  X  10“^ 

0.27  X  lO-'^ 

0.19  X  10-^^ 

95%  C.I. 

[1.66, 1.82]  X  lO-'' 

[9.04, 10.12]  X  lO"'* 

[5.88,6.64]  X  10“^^ 

Table  7.  Estimating  a  path-dependent  probability. 


4.6  Example:  A  mixed  open-closed  queueing  network 

Consider  the  mixed  open  and  closed  queueing  network  as  shown  in  Fignre 

2. 


Figure  2.  An  open/closed  queneing  network. 
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The  open  jobs  arrive  at  server  1  according  to  a  Poisson  process  with  rate 
A.  There  is  one  closed  job  that  circulates  between  server  1  and  server  2, 
and  it  has  pre-emptive  priority  over  open  jobs  at  server  1.  All  services  rates 
are  exponentially  distributed.  The  service  rates  at  server  1  are  /rn  for  open 
jobs  and  ni2  the  closed  job,  and  the  service  rate  at  server  2  is  ^2-  Note 
that  this  system  is  equivalent  to  an  M/M/1  queue  with  server  breakdowns 
or  vacations. 

The  state  of  the  system  is  described  by  process  {(it,  Zt)  :  t  >  0},  where 
it  and  Zt  are  the  numbers  of  open  jobs  and  closed  jobs  at  server  1  at  time 
t.  We  wish  to  estimate  pn,  the  probability  that  the  number  of  open  jobs 
reaches  n  before  the  system  returns  to  state  (0,0),  given  that  the  system 
starts  in  (0, 0). 

In  general,  the  system  can  have  multiple  closed  jobs.  Even  though  we 
only  consider  the  case  of  one  closed  job,  the  approach  works  for  the  general 
case,  with  the  sole  difference  being  that  the  computation  of  7  becomes  more 
involved. 

The  following  stability  assumption  will  be  in  effect  throughout  this  sec¬ 
tion: 


1^11  1^2  +  M12 


<  1. 


(4.10) 


Under  this  assumption,  the  system  is  stable,  and  pn  is  a  rare-event  proba¬ 
bility  when  n  gets  large. 

The  associated  large  deviation  asymptotics  can  be  characterized  by  an 
ODE.  More  precisely,  let  y  G  {0,1,..., n},  z  G  {0,1},  and  Vn{y,z)  the 
probability  that  the  number  of  open  jobs  reaches  n  before  the  system  returns 
to  state  (0,0),  given  that  the  system  starts  in  {y,z)  [whence  pn  =  14(0,0) 
by  definition].  Given  any  x  G  [0, 1]  and  z  G  {0, 1},  we  have 


lim  —  log  Vn(\nx\ ,  z)  =  —vix) 

n—^oo  Tt 


where  u  is  a  viscosity  solution  to  an  ordinary  differential  equation  (ODE). 
To  describe  this  ODE,  we  define  the  convex  function 


£{s) 


slogs  —  s  -1-  1,  s  >  0 
-foo  ,  s  <  0  ’ 


and  introduce  the  notation  p  =  (po,  Pi),  ©  =  (Aq,  Ai,  pn,  /ii2,  /i2)-  For  /3  G  M 
define 

L(/?)  =  infG(p,0),  (4.11) 

P.0 
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where 


G{p,  0)  =  pQ 


+  Pi 


+  Pl2(- 


«It) +«■<(  — )+«<(- 

A  /  \PllJ  \P2 


Pl2 

Pl2 


and  with  the  inhmum  in  (4.11)  taken  over  all  {p,  0)  such  that 
Po  >  0,  /?!  >  0,  po  +  pi  =  1,  PoPi2  =  Pip2,  /3  =  PoAo  +  PiAi  — (4.12) 

One  can  show  that  function  L  is  indeed  convex  and  explicitly  calculate  the 
Legendre  transform  H  of  L.  Indeed,  we  have  (see  the  appendix  for  details) 


H{a)  =  inf[Ho{a,q)  V  Hi{a,q)]  (4-13) 

where 


Ho{a,q)  =  A(e"  -  1)  + /^i2(e'^  -  1), 

Hi{a,q)  =  A(e"  -  1) +  /^ii(e“"  -  1) +  /^2(e“'^  -  1). 

Then  v  satisfies  the  ODE 

0  =  mf  [v'{x)  ■  P  +  L{P)]  =  —H{—v'{x)), 

with  the  boundary  condition  u(l)  =  0.  Solving  this  ODE  (again  see  the 
appendix  for  details),  we  obtain 

u(x)  =  7(l  — x),  (4-14) 

where 

(A  + /xii  + /xi2  + 1^2)  +  v/(X+"/LTT7T2”T7i2p'^-^MiT(X+7^^ 

^  =  - 2«.(1  +  A-Vi2) - 

is  a  positive  number.  In  particular, 

lim  —  logpn  =  — u(0)  =  —7. 

n—^oo  Ti 

Moreover,  the  minimizing  0  for  (4.11)  is 

0*  =  (A5,  pI)  =  (e^A,  e^A,  e'^n,  e'^Vi2,  e-''V2),  (4.15) 
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(4.16) 


where  q*  is  the  minimizer  in  equation  (4.13)  with  a  =  7,  or 


q*  =  log 


A  +  /X12  -  Ae'*' 
1^12 


Now  let  us  consider  the  construction  of  importance  sampling  algorithms. 
We  first  describe  the  associated  Isaacs  equation.  Let  0  =  (Aq,  Ai,  /In,  /I12,  /l2)- 
Define 


L(/3,  0)  =  inf 

P,0 


2G(p,0)-G(p,0,0) 


(4.17) 


where 


G(p,0,0)=po  +/ll2^(^gj 


+  pi 


and  the  infimum  in  (4.17)  is  taken  over  all  (p,  0)  satisfying  the  constraints 
(4.12).  The  Isaacs  equation  associated  with  importance  sampling  can  then 
be  written  as 

sup  inf  [IT'(x)  •  (3  +  L{f3,  0)]  =  0, 
e  P 

with  boundary  condition  W{1)  =  0.  Let  W  =  2v.  It  is  not  difficult  to 
show  (see  the  appendix  for  details)  that  (IT,  0*)  defines  a  affine  subsolu¬ 
tion/control  pair  to  the  Isaacs  equation,  and  it  satisfies  the  terminal  con¬ 
dition  IT(1)  =  2u(l)  =  0.  Furthermore,  IT(0)  =  2u(0)  =  27,  the  optimal 
decay  rate. 

The  importance  sampling  algorithm  corresponding  to  this  affine  subso¬ 
lution/control  pair  (IT,  0*)  is  very  simple.  When  z  =  1,  we  simulate  the 
system  under  the  alternative  probability  measure  such  that  the  open  job 
arrival  rate  is  Aj  and  the  service  rate  for  the  closed  job  at  server  1  is  pj2- 
When  z  =  0,  the  simulation  distribution  is  such  that  the  open  job  arrival 
rate  is  Aq  and  the  service  rate  for  the  open  job  is  nh  and  the  service  rate 
for  the  closed  job  at  server  2  is  P2-  Note  that,  for  this  special  network,  since 
Aq  =  A j  =  A* ,  the  above  change  of  measure  is  equivalent  to  simulation  under 
the  alternative  rates  (A*, ^^2; ^2)- 

In  the  numerical  example  below  we  take  A  =  l,pii  =  4,  pi2  =  2,p2  = 
0.5.  It  is  easy  to  check  that  the  stability  condition  (4.10)  holds.  We  run  sim¬ 
ulations  for  n  =  20, 40, 80,  and  each  estimate  consists  of  20,000  simulations. 
The  theoretical  value  can  be  found  in  [8]. 
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n  =  20 

n  =  40 

II 

00 

0 

Theoretical  value 

3.91  X  10-'* 

2.02  X  10“^'’ 

5.40  X  10“'^" 

Estimate 

3.93  X  10"'^ 

2.01  X  lO”^'" 

5.45  X  10“^“ 

Standard  Error 

0.03  X  10“'^ 

0.02  X  10-^^ 

0.04  X  10-^'^ 

95%  C.I. 

[3.87,3.99]  X  10-» 

[1.97,2.05]  X  10"^^ 

[5.37,5.53]  X  10-^'J 

Table  8.  Overflow  probabilities  of  a  mixed  open-closed  queueing  network. 

4.7  Example:  A  “universal”  importance  sampling  scheme 

For  all  the  examples  we  have  discussed,  finitely  many  affine  subsolution/control 
pairs  are  used  to  construct  a  generalized  subsolution/control.  In  this  sec¬ 
tion,  we  present  an  example  where  infinitely  many  subsolution/control  pairs 
are  used  for  this  purpose.  The  corresponding  importance  sampling  scheme 
has  some  interesting  features,  which  are  further  discussed  in  Remark  4.3. 

For  illustration,  we  consider  again  the  simple  setting  where  {Yi,  1^2;  •  •  •} 
is  a  sequence  of  iid  random  variables  taking  values  in  and  let 


with  Xq  =  0.  We  wish  to  estimate  P{Xn  G  A}  where  R  C  is  a  Borel  set, 
and  assume  a  large  deviation  limit  holds,  that  is, 

lim  —  logP{A„  e  A}  =  —  inf  L{P)  =  —7. 

n— >00  n  p&A 

A  new  way  to  construct  a  subsolution  is  as  follows.  Consider  the  level 
set  of  the  rate  function  L 

0^  =  {/3  G  :  L(/?)  >  7}  . 

It  follows  easily  that  0-^  is  the  complement  of  a  convex  set  and  that  A 
For  each  d  G  90.y,  let  a{B)  be  its  convex  conjugate  (assuming  its  existence). 
Define  {Wp,a{(3))  G  A  by 

WpixA)  =  -2{a{f3),x)  -|-2(a(/3),/3)  -  2(1  -t)H[a{l3)]. 

Let 

W{xA)  =  inf  {Wf}{x,t)  '■  /3  G  90.^}  . 

Then  W  defines  a  (weak  sense)  subsolution  to  the  Isaacs  equation  (2.3). 
Furthermore,  thanks  to  the  convexity  of  0-,,,  we  have 

W{x,  1)  =  inf  {— 2(a(/3), x  —  /9)  :  /3  G  50^}  <  0 
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for  X  G  0^.  In  particular,  II^  satisfies  the  terminal  condition  (2.7)  since 
A  <Z  In  many  cases  (e.g.,  when  L  is  hnite),  we  have 

li^(0, 0)  =  inf  {2(a(/3), /3)  -  2i7[a(/3)]  :  /?  G  50.^} 

=  inf  {2L(/3)  :  f3  G  50.^} 

=  27, 

the  optimal  decay  rate. 

Using  Remark  2.2  and  Theorem  2.2,  if  IT  were  continuously  differentiable 
then  {W,a)  with  a{x,t)  =  —DW{x,t)/‘^  would  form  a  subsolution/control 
pair,  and  the  corresponding  importance  sampling  schemes  would  be  asymp¬ 
totically  optimal  performance.  Since  W  is  not  continuously  differentiable, 
one  possibility  is  to  resort  to  mollification.  However,  an  alternative  that 
is  possible  in  some  cases  is  to  show  that  for  each  e:  >  0  one  can  find  a 
smooth  function  such  that  {W‘^,a)  form  a  subsolution/control  pair  and 
— >  lU  as  e  — >  0.  This  approach  is  used  in  the  following  example. 

To  give  a  concrete  example,  we  will  work  out  the  details  for  iid  Id) 
sequence  {Yi,  12,  •  •  •},  where  Id  denotes  the  identity  matrix  of  dimension  d. 
It  follows  that  H{a)  =  ||q;|P/2,  T(/3)  =  ||/3|p/2,  and  whence 

0^  =  {/?  G  M'"  :  m\  >  . 

In  this  case  a{f3)  =  (5,  and 

W{x,t)  =  inf  {-2(/9,  x)  +  2(/3, /9)  -  2(1  -  :  ||/3||  G  90.^} 

=  inf  |-2(/?,x) -I- 2(1 +  t)7  :  II/3II  = 

=  -2-v/^lkll  +  2(1  +  t)7. 


It  is  not  difficult  to  check  by  direct  computation  that  W  satisfies  the  Isaacs 
equation  (2.3)  except  at  {x  =  0},  and  IU(x,  1)  <  0  on  0.y.  In  particular,  we 
have  W{0, 0)  =  27_. 

Even  though  W  is  not  continuously  differentiable  at  {x  =  0},  it  induces 
a  control 


a(x,  t)  = 


DW{x,t) 

2 


For  X  =  0,  we  just  define 

d(x,O)  =  y/^0,  (4.18) 


where  9  is  an  arbitrarily  fixed  unit  vector.  We  claim  that  the  importance 
sampling  scheme  corresponding  to  this  control  a  is  asymptotically  optimal. 
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Indeed,  consider  the  approximating  seqnence  defined  as  follows.  Let 

W^{x,  t)  =  -2^/^^/\\x\\'^  +  £  +  2(1  +  t)7. 

It  is  not  difficnlt  to  check  that,  for  every  e  >  0,  is  continnonsly  differen¬ 
tiable  and  (II^^,d)  is  a  snbsolntion/control  pair.  The  asymptotic  optimality 
follows  if  one  applies  Theorem  2.2  to  this  snbsolntion/control  pair,  and  ob¬ 
serves  that 

limIT^(0,0)  =  IT(0,0)  =  27, 

£^0 

the  optimal  decay  rate. 

For  numerical  experimentation,  we  take  d  =  2  and 

A  =  {x  =  (xi,  X2)  :  (xi  -t-  o)^  +  X2  >  R^} 

for  some  constants  0  <  a  <  i?.  It  follows  that 

7  =  inf  L(/3)  =  hu-af, 

3&A  2 


with  minimizer  f3*  =  {R  —  a,  0).  Setting  9  =  (1,  0)  in  equation  (4.18),  the  im¬ 
portance  sampling  scheme  is  as  follows.  We  recursively  simulate  {Yi,Y2, . . .} 
and  let 


2=1 


The  conditional  distribntion  of  Yjj^i  given  Xj  =  x  =  {xi,X2)  is 


N 


{R  —  a)xi/||x|| 

{R  -  a)x2/iixii  _  ’ 


if  X  7^  0,  and 


N 


R  —  a 
0 


if  X  =  0. 

For  the  table  below  we  take  R  =  0.5,  a  =  0.05.  We  rnn  simulations 
for  n  =  40,  80, 120,  and  each  estimate  consists  of  20,000  simulations.  The 
theoretical  valne  can  be  obtained  nsing  standard  software  snch  as  S-plus. 


n  =  40 

II 

00 

0 

n  =  120 

Theoretical  value 

8.49  X  10"^ 

1.00  X  10-'‘ 

1.40  X  10“'’ 

Estimate 

8.38  X  10-^ 

1.04  X  10-"‘ 

1.50  X  lO-” 

Standard  Error 

0.18  X  10“^ 

0.04  X  10-"‘ 

0.07  X  10“'’ 

95%  C.I. 

[8.02,  8.74]  X  10"'^ 

[0.96,1.12]  X  10-* 

[1.36, 1.64]  X  10“'’ 

Table  9.  A  “universal”  scheme 
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Remark  4.3  An  important  feature  of  this  approach  is  that  the  construction 
of  the  importance  sampling  scheme  does  not  need  any  information  on  the 
set  A  other  than  7,  the  infimum  of  L{(3)  over  P  G  A.  It  is  for  this  reason 
that  the  scheme  might  be  called  “universal.”  The  scheme  has  strengths  and 
weaknesses.  On  one  hand,  the  scheme  is  simple  and  applicable  in  more 
general  settings.  On  the  other  hand,  it  is  often  the  case  that  one  knows 
some  more  detailed  properties  of  target  set  A,  which  can  be  used  to  design 
more  efficient  schemes.  Furthermore,  there  is  a  practical  computational 
issue  of  obtaining  W  as  the  minimum  of  infinitely  many  subsolutions  in  the 
case  of,  say,  sums  of  functionals  of  a  Markov  chain.  However,  here  one  may 
be  willing  to  approximate  0-,,  by  a  finite  number  of  points  and  then  use 
exponential  weighting  for  mollification. 

4.8  Summary 

We  have  shown  that  importance  sampling  schemes  based  on  subsolutions 
can  be  applied  in  a  wide  variety  of  settings  and  deliver  excellent  perfor¬ 
mance.  Besides  being  fast  and  accurate,  the  behavior  of  the  schemes  is  very 
stable  across  a  broad  range  of  problem  formulations.  For  example,  in  each 
setting  and  for  each  simulation  we  use  the  same  number  of  samples  (20,000) 
with  remarkably  similar  performance.  Moreover  the  asymptotic  behavior  of 
the  schemes  can  be  backed  up  by  rigorous  theoretical  justification.  Both 
of  these  properties  stand  in  sharp  contrast  to  the  instability  and  poor  per¬ 
formance  exhibited  by  standard  heuristic  importance  sampling  schemes  in 
many  situations  [10,  9,  4,  5]. 

5  Appendix.  Computation  for  the  mixed  open- 
closed  network  example 

Proof  of  Equation  (4.13).  We  first  define  two  functions.  Let  6  = 
(5i,  62)  G  and  define 

Similarly,  let  7  =  (71, 72)  G  and  define 

L.O)  =  inf  |Af  [fAj  I 
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where  the  infimum  is  taken  over  all  (Xi,  ftu,  jl2)  such  that 

Xi-fiu  =  7]i,  -l^2  =  m- 

It  is  not  difficult  to  show  by  direct  computation  that  Lj  is  convex  and  its 
convex  conjugate  is  Hi,  for  each  z  =  0, 1.  Given  6  =  {61,62)  £  1^^,  let 

Q{6)  =  inf  {pqLo{8)  +  piLi{r])  :  po  >  0,  pi  >  0,  po  +  pi  =  1,  P08  +  pip  =  6}  . 

Thanks  to  [2,  Corollary  D.4.3],  G  is  a  convex  function  whose  convex  conju¬ 
gate  is  HoV  Hi.  Also,  it  is  easy  to  see  by  definition  that 

L{P)  =  Q{P,0). 

Thus  L  is  convex  and  its  convex  conjugate  is  H. 

Proof  of  Equations  (4.14)  -  (4.16).  Consider  the  equation 

H{a)  =  inf  [ifo(«,  q)  V  Hi{a,  g)]  =  0. 
q 

For  every  fixed  a,  HQ{a,  q)  is  a  strictly  increasing  function  of  q  and  limg^oo  Ho{a,  q) 
+00.  Similarly,  for  each  fixed  a,  Hi(a,q)  is  a  strictly  decreasing  function 
of  q  and  linig^-co  Hi(a,  q)  =  4-00.  It  follows  that,  for  each  fixed  a,  there 
exists  a  unique  q  =  q(a)  such  that 

Ho(a,  q(a))  =  Hi(a,  q(a))  =  H(a). 

Therefore,  solving  H(a)  =  0  is  equivalent  to  finding  (a,  q)  such  that 

Ho(a,q)  =  Hi(a,q)  =  0, 

which  yields  the  7  of  (4.14)  and  the  q*  of  (4.16).  Thus  (j,q*)  satishes 

Ho(7,q*)  =  Hi(j,q*)  =  0.  (5.1) 

(It  turns  out  that  (0, 0)  is  also  a  solution,  but  it  is  elementary  that  this  root 
does  not  characterize  the  relevant  solution  to  the  PDE.)  The  computation 
for  0*  is  straightforward  and  thus  omitted.  ■ 

Proof  that  (1P,0*)  is  a  subsolution/control  pair.  We  only  need  to 
show  that 

inf  [-27/3  + L(/3,0*)]  >0.  (5.2) 
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However,  analogous  to  the  proof  of  equation  (4.13),  one  can  show  that 
L{f3,Q*)  is  convex  with  respect  to  [3  and  its  Legendre  transform,  denoted 
by  H{a),  is 

H{a)  =  inf  \Ho{a,  q)  V  Hi{a,  q)] 

q 

with 

Ho(,a,q)  =  e“  —  2A  +  Aq^  +  —  2^12  +  ^12^ 

Hi{a,  q)  =  —  2A  +  A^^  +  ^  ^  ~ 

Then  the  inequality  (5.2)  reduces  to  —H{2j)  >  0.  Indeed,  we  claim  that 

iL(27)  =  0.  (5.3) 


To  this  end,  note  that 

H{2j)  =  inf  [.Ho (27,  q)  V  Hi  (27,  g)]  . 

However,  by  direction  computation  and  equations  (4.15),  (5.1),  we  have 
Ho (27,  2q*)  =  2A(e^  -  1)  +  2/112(6"*  -  1)  =  2Ho(7,  q*)  =  0 


and 


Hi (27,  2q*)  =  2A(e^  -  1)  +  2fin{e-'^  -  1)  +  2/12(6“"*  -  1)  =  2Hi(7,  q*)  =  0. 


It  is  now  very  easy  to  argue  that 

H(27)  =inf  [Ho(27,c/)VHi(27,g)]  =  Ho(27,  2(/*)  V  Hi(27,  2(/*)  =  0, 

q 

which  completes  the  proof.  ■ 
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